Skip to content

feat(telemetry): add streaming JSON export writer#960

Open
EhabY wants to merge 5 commits into
mainfrom
feat/issue-903-export-telemetry-json-writer
Open

feat(telemetry): add streaming JSON export writer#960
EhabY wants to merge 5 commits into
mainfrom
feat/issue-903-export-telemetry-json-writer

Conversation

@EhabY
Copy link
Copy Markdown
Collaborator

@EhabY EhabY commented May 17, 2026

Summary

  • Add writeJsonArrayExport that streams telemetry events to disk as a JSON array. Chunks are piped through stream/promises pipeline so memory stays flat regardless of maxTotalBytes (default 100 MB, no enforced ceiling).
  • Extract a shared src/util/fs.ts module hosting tempFilePath, renameWithRetry, and a new writeAtomically(path, write, onCleanupError?) helper that captures the temp-file + rename + best-effort cleanup pattern.
  • The new helper replaces CliCredentialManager.atomicWriteFile (16 lines -> 4) and backs the export writer. SSH config / support bundle / CLI manager keep their bespoke flows and just update imports.

Refs #903.

Stack: 2 / 4. Base: #959. Next: #961.

Why streaming

Export ranges include "All time", which reads everything on disk. With maxFileBytes and maxTotalBytes user-tunable and unbounded, a buffered writer would peak at ~2x the export size in memory and OOM at high caps. Streaming holds memory constant at ~12 lines of code vs ~10 for the buffered version.

Changes

  • src/telemetry/export/writers/json.ts (new): writeJsonArrayExport returns the event count; uses serializeTelemetryEvent from the shared wire format module so the output cannot drift from the on-disk shape.
  • src/util/fs.ts (new): tempFilePath, renameWithRetry, writeAtomically (with optional onCleanupError for diagnosing orphaned temp files).
  • src/util.ts: drops the moved helpers.
  • src/core/cliCredentialManager.ts: atomicWriteFile now defers to writeAtomically and passes its logger so cleanup failures are still logged.
  • src/telemetry/export/range.ts: exports parseTelemetryTimestampMs so the writer suite can build wire-format-equivalent events from the shared factory.
  • test/unit/util/fs.test.ts (new): suite for the three helpers, including atomic-write success, partial-write rollback, and callback return values.
  • test/unit/telemetry/export/writers/json.test.ts (new): uses memfs, the shared createTelemetryEventFactory, and the shared serializeTelemetryEvent. Covers success, empty input, and a mid-stream failure leaving the destination untouched with no orphan temp file.
  • test/unit/telemetry/export/files.test.ts: switched to a uniform memfs mock setup.
  • test/mocks/asyncIterable.ts (new): shared helper that wraps a sync array as an AsyncIterable yielding one item per microtask.
  • test/unit/util.test.ts: moved-out blocks removed.
  • test/unit/core/supportBundleLogs.test.ts: vi.mock target retargeted to @/util/fs.

@EhabY EhabY self-assigned this May 17, 2026
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 199d2e7 to 3a29ac9 Compare May 17, 2026 19:42
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-core branch from 9d7d4ce to 3c02990 Compare May 17, 2026 19:42
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 3a29ac9 to eb776a7 Compare May 18, 2026 17:03
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-core branch from 3c02990 to 350d662 Compare May 18, 2026 17:03
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from eb776a7 to ba095a9 Compare May 18, 2026 19:05
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-core branch from ca1bd07 to c0e517d Compare May 19, 2026 08:47
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from ba095a9 to 8066586 Compare May 19, 2026 10:00
@EhabY EhabY changed the title feat(telemetry): add JSON export writer feat(telemetry): add streaming JSON export writer and shared atomic-write helper May 19, 2026
@EhabY EhabY changed the title feat(telemetry): add streaming JSON export writer and shared atomic-write helper feat(telemetry): add streaming JSON export writer May 19, 2026
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 3d77d32 to c9c3b74 Compare May 19, 2026 11:02
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-core branch 2 times, most recently from d550ffd to aca8405 Compare May 20, 2026 10:27
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from c9c3b74 to 9c5a19d Compare May 20, 2026 10:28
@EhabY EhabY requested a review from mafredri May 20, 2026 10:32
Base automatically changed from feat/issue-903-export-telemetry-core to main May 20, 2026 13:48
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 9c5a19d to 914025b Compare May 20, 2026 13:49
@EhabY
Copy link
Copy Markdown
Collaborator Author

EhabY commented May 21, 2026

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid extraction. The writeAtomically helper genuinely deduplicates the temp-file + rename + cleanup pattern, the streaming writer holds O(1) memory via pipeline, and the wire-format reuse through serializeTelemetryEvent prevents format drift. Tests are real (memfs-backed, non-vacuous assertions, success + empty + error paths). CI green.

Four P3s, one nit.

The sharpest finding is a convergent one: three reviewers independently flagged that "propagates midstream errors" only checks error propagation, not destination integrity. The PR description claims the latter. The atomicity property IS tested at the writeAtomically layer, but the integration boundary (streaming pipeline error -> atomic cleanup) is the interesting edge. Two lines fix it.

The cleanup-logging regression from the cliCredentialManager extraction is worth knowing. The old code logged temp-file cleanup failures; the new utility silently swallows them. The design choice is documented and reasonable (generic utility shouldn't take a logger), but on Windows where antivirus locks are real, orphaned .tmp-* files now accumulate without a trace.

"Huh, streaming with highWaterMark: 1 is clean. Memory stays flat regardless of export size." -- Killua

🤖 This review was automatically generated with Coder Agents.

Comment thread test/unit/telemetry/export/writers/json.test.ts Outdated
Comment thread src/util/fs.ts Outdated
Comment thread test/mocks/asyncIterable.ts Outdated
Comment thread src/util/fs.ts
Comment thread src/telemetry/export/writers/json.ts
Comment thread src/util/fs.ts Outdated
@EhabY
Copy link
Copy Markdown
Collaborator Author

EhabY commented May 21, 2026

/coder-agents-review

@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 034f5bc to 5a9edee Compare May 21, 2026 15:13
Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All six R1 findings addressed cleanly in 034f5bc. The midstream test now verifies destination integrity, the onCleanupError callback restores diagnostic visibility, doc comments are accurate, and the suffix is consistent. Good fix round.

Two P3s on the new onCleanupError parameter itself.

The callback was the right design choice for DEREM-2, but the implementation has two gaps: it's untested (so the fix has no regression guard), and a throwing callback silently swallows the original write error despite the doc promising otherwise. Both are small fixes.

"The callback is the fix. An untested fix is an unreliable fix." -- Bisky

🤖 This review was automatically generated with Coder Agents.

Comment thread test/unit/util/fs.test.ts
Comment thread src/util/fs.ts Outdated
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch 3 times, most recently from 1671ea2 to 5cbd8ba Compare May 21, 2026 15:51
EhabY added 3 commits May 21, 2026 18:53
Move `tempFilePath` and `renameWithRetry` from `src/util.ts` into a
focused `src/util/fs.ts`, alongside a new `writeAtomically(path, write)`
helper that captures the temp-file + rename + best-effort cleanup
pattern previously copy-pasted into the telemetry writer and
`CliCredentialManager.atomicWriteFile`. Both call sites now use the
shared helper; the SSH config / support bundle / CLI manager keep their
bespoke flows and just update imports.

Rewrite `writeJsonArrayExport` to stream chunks through
`Readable.from(...)` into `createWriteStream` via `stream/promises`
`pipeline`. Memory stays flat regardless of `maxTotalBytes`, which has
no enforced ceiling and can comfortably exceed the default 100 MB cap.

Tests for `tempFilePath` and `renameWithRetry` move into
`test/unit/util/fs.test.ts` alongside a new `writeAtomically` suite
covering success, partial-write rollback, and callback return values.
- Move src/telemetry/export/writers.ts to writers/json.ts and the
  matching test to writers/json.test.ts. The writer keeps the same
  shape; the doc comment is tightened and the test drops a
  redundant atomic-failure assertion already covered by
  writeAtomically's own tests.
- Extract parseTelemetryTimestampMs from range.ts's
  isTimestampInRange so other export writers can reuse it.
- Add a trivial asyncIterable test helper to test/mocks/.
- Inline the duplicated `vi.mock("node:fs", ...)` /
  `vi.mock("node:fs/promises", ...)` factory pair (9 lines per file)
  as a 2-line dynamic import in the affected test files.

Upstream-compatible groundwork for the OTLP writer branch (#961);
isolates move + helper-extraction churn so that PR's diff stays
focused on OTLP-specific work.
EhabY added 2 commits May 21, 2026 18:53
- Add a required onCleanupError callback to writeAtomically and
  writeJsonArrayExport so callers cannot silently lose temp-file
  cleanup diagnostics (matters on Windows, where AV/Indexer locks
  can leave large export orphans). cliCredentialManager passes its
  logger to restore the warn-on-cleanup behavior that existed
  before the extraction.
- Wrap the rm+callback chain in a try/catch so a throwing
  onCleanupError can't replace the original write error.
- Document that writeAtomically requires the parent directory to
  exist.
- Use suffix "temp" so the call matches the doc example and the
  cliManager convention.
- Cover the cleanup-failure path with new tests: callback invoked
  with the rm error, and a throwing callback still surfaces the
  writer's error.
- Tighten the json writer's midstream-error test to assert the
  destination survives and no temp file is left behind.
- Fix the asyncIterable helper's docstring: it exercises async
  iteration, not stream backpressure.
After the 999->888 takeover, the 888 monitor would eventually fail
network reads, call processLost, then searchForProcess would find
888 again and emit ssh.process.recovered (same-PID rediscovery).
On slow CI runners that race won the negative assertion at
sshProcess.test.ts:469. Return [] from find after the takeover so
no rediscovery is possible.
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 5cbd8ba to d8a9b52 Compare May 21, 2026 15:54
@EhabY
Copy link
Copy Markdown
Collaborator Author

EhabY commented May 21, 2026

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 8 prior findings addressed and verified. Netero clean, panel clean (Mafuuu, Bisky, Chopper). No new findings.

The onCleanupError callback evolution across rounds landed well: optional in R2, hardened against throwing callbacks in R3, now required so callers make an explicit choice about diagnostic visibility. The try/catch wrapping preserves the "original error always rethrown" contract. Tests cover all branches including the callback-throws path.

The sshProcess test fix (replacing mockResolvedValue with mockResolvedValueOnce + empty fallback) eliminates a race where the monitor loop could rediscover the replaced PID and emit a spurious recovery event.

🤖 This review was automatically generated with Coder Agents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant